Introduction

Our data is derived from Instagram accounts and comes from the website known as Kaggle.com.

Our data contains usernames, followings, followers, likes, comments, and locations of different accounts.

We added some columns in our data such as engagement, engagement_quantile, post_timestamp, and caption_length.

This data is interesting because it has a large sample of different accounts where we can draw conclusions about patterns in engagement scores. We also compare and contrast some things from our data.

Problem Statement and Questions

We choose this data to understand Instagram engagement trends and the factors which contributes to post and videos.

What Libraries did we use?

## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'plotly' was built under R version 4.3.3

Lets take a look at Unfiltered Data

insta_data <- read_csv("instagram_data.csv")
glimpse(insta_data)
## Rows: 11,692
## Columns: 14
## $ owner_id        <chr> "36063641", "36063641", "36063641", "36063641", "36063…
## $ owner_username  <chr> "christendominique", "christendominique", "christendom…
## $ shortcode       <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33TadDM…
## $ is_video        <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ caption         <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta3 X …
## $ comments        <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884, 211…
## $ likes           <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287, 74…
## $ created_at      <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 170871…
## $ location        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl        <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.30808-6…
## $ multiple_images <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
## $ username        <chr> "christendominique", "christendominique", "christendom…
## $ followers       <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 2144626, …
## $ following       <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, …

Mutated Columns

Key terms

Engagement - refers to the actual score from the data.

engagement_quantile - refers to the follower count divided into four different quarters.

post_timestamp - refers to the time when pictures or videos was posted.

caption_length - refers to the length of the caption.

(Added new columns which represent 1 as the lowest followers, 2 and 3 as the average followers and the 4 as the highest followers).

new_data<- insta_data %>% mutate(engagement = round((((likes+comments)/followers)*100),digits = 2),
                                 follower_quantile = ntile(followers,4),
                                 engagement_quantile = ntile(engagement,4),
                                 post_timestamp = as_datetime(created_at),
                                 post_time = format(round(post_timestamp,units = "hours"),format = "%H:%M"),
                                 caption_length = lengths(strsplit(caption, ' ')))

Filtered Data

Our original data was messed up so we added new columns with calculated values.

## Rows: 11,692
## Columns: 20
## $ owner_id            <chr> "36063641", "36063641", "36063641", "36063641", "3…
## $ owner_username      <chr> "christendominique", "christendominique", "christe…
## $ shortcode           <chr> "C3_GS1ASeWI", "C38ivgNS3IX", "C35-Dd9SO1b", "C33T…
## $ is_video            <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
## $ caption             <chr> "I’m a brunch & Iced Coffee girlie☕️🍳 \n\nTop @ta…
## $ comments            <dbl> 268, 138, 1089, 271, 145, 143, 356, 132, 128, 884,…
## $ likes               <dbl> 16382, 9267, 10100, 6943, 17158, 9683, 42906, 4287…
## $ created_at          <dbl> 1709326758, 1709241048, 1709154707, 1709065322, 17…
## $ location            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ imageUrl            <chr> "https://instagram.flba2-1.fna.fbcdn.net/v/t39.308…
## $ multiple_images     <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ username            <chr> "christendominique", "christendominique", "christe…
## $ followers           <dbl> 2144626, 2144626, 2144626, 2144626, 2144626, 21446…
## $ following           <dbl> 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 10…
## $ engagement          <dbl> 0.78, 0.44, 0.52, 0.34, 0.81, 0.46, 2.02, 0.21, 0.…
## $ follower_quantile   <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1,…
## $ engagement_quantile <int> 3, 2, 3, 2, 3, 2, 4, 2, 2, 4, 2, 2, 1, 3, 4, 2, 3,…
## $ post_timestamp      <dttm> 2024-03-01 20:59:18, 2024-02-29 21:10:48, 2024-02…
## $ post_time           <chr> "21:00", "21:00", "21:00", "20:00", "20:00", "20:0…
## $ caption_length      <int> 12, 34, 81, 57, 17, 66, 50, 17, 8, 53, 17, 20, 90,…

Key terms

is_video - refers to the videos posted on Instagram account.

caption - refers to the titles on the Instagram posts.

comments/likes - refers to the followers response to the posts.

created_at - refers to the coded time stamp of when the post was created.

multiple_images - refers to the boolean of whether the post was a carousel or multiple image upload.

followers/following - refers to the users.

Reference of Account Followers Distribution

Insights on the average follower distribution meaning 1 is the lowest, 4 is the highest.

## # A tibble: 5 × 2
##   follower_quantile follower_mean
##               <int> <chr>        
## 1                 1 108,262      
## 2                 2 342,149      
## 3                 3 834,535      
## 4                 4 8,559,178    
## 5                NA NA

When do posts get the most engagement?

This is showing average engagement percent by post local time.

## # A tibble: 24 × 3
##    post_time `mean(engagement)` `n()`
##    <chr>                  <dbl> <int>
##  1 00:00                   2.08   261
##  2 01:00                   1.88   267
##  3 02:00                   1.71   238
##  4 03:00                   2.99   178
##  5 04:00                   2.37   138
##  6 05:00                   3.34   125
##  7 06:00                   2.38   145
##  8 07:00                   1.59   185
##  9 08:00                   3.81   223
## 10 09:00                   1.93   293
## # ℹ 14 more rows

Reference of the Post Engagement

We see the most engagement between the hours of 5am, 8am, 12pm, 1pm, 4pm and 5pm, during peak times of the day.

What is the relationship between caption length and engagement?

Highest engagement posts include captions with lengths x & y.

The graph shows that the short captions gains more engagement.

From pictures and videos which one get the most comment and likes?

We see pictures get more comments and likes than videos.

The following examples, give us glimpse of the one the videos with the highes engagement as well as an image post.

Here we notice this account @therawtextures has 263,044 followers, at the time of the data extraction and was able to receive 909,788 like and 2,683 comments with a 346% engagement score!

View this post on Instagram

A post shared by Priyanshu Kumar (@therawtextures)

Below is an example of a single image that gained @maaren_xx creator a 105% engagement score with 3,222 likes and 45 comments while having 3,114 followers.

View this post on Instagram

A post shared by Maren (@maaren_xx)

Summary

In summary, our presentation emphasized that our Instragram accounts data have following: